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Abstract 

The theorem of Shannon-McMillan-Breiman states that for every generating partition on an er- 
godic system, the exponential decay rate of the measure of cyhnder sets equals the metric entropy 
almost everywhere (provided the entropy is finite). In this paper we prove that the measure of cylin- 
der sets are lognormally distributed for strongly mixing systems and infinite partitions and show 
that the rate of convergence is polynomial provided the fourth moment of the information function 
is finite. Also, unlike previous results by Ibragimov and others which only apply to finite partitions, 
here we do not require any regularity of the conditional entropy function. We also obtain the law of 
the iterated logarithm and the weak invariance principle for the information function. 

1 Introduction 

Let /i be a T-invariant probability measure on a space Q on which the map T acts measurably. For a 
measurable partition A one forms the nth join A"' = Vj=o^ T^^A vi^hich forms a finer partition of fl. (The 
atoms of A" are traditionally called n-cylinders.) For x £ we denote by An{x) G A" the n-cylinder 
which contains x. The Theorem of Shannon-McMillan-Breiman (see e.g. [23l[30]) then states that for 
/i- almost every x in Q the limit 

^.^ -log/i(A„(x)) 

n — >oo Ji 

exists and equals the metric entropy h{^) provided the entropy is finite in the case of a countable infinite 
partition. It is easy to see that this convergence is not uniform (not even for Bernoulli measures with 
weights that are not all equal). This theorem was proved for finite partitions in increasing degrees of 
generality in the years 1948 to 1957 and then was by Carleson [8^ and Chung [TD] generalised to infinite 
partitions. Similar results (for finite partitions) for the recurrence and waiting times were later proved by 
Ornstein and Weiss [26j and Nobel and Wyner [25] respectively. The limiting behaviour for recurrence 
times was generalised in 2002 by Ornstein and Weiss [27] to countably infinite partitions. In the present 
paper we are concerned with the limiting distribution of the information function In{x) = — \ogfi(An{x)) 
around its mean value. 

The statistical properties of /„ are of great interest in information theory where they are connected 
to the efficiency of compression schemes. Let us also note that in dynamical systems the analog of SMB's 
theorem for compact metric spaces is the Brin-Katok local entropy formula which states that for 
ergodic invariant measures the exponential decay rate of dynamical balls is almost everywhere equal to 
the entropy. 

There is a large classical body of work on the Central Limit Theorem (CLT) for independent random 
variables. For dependent random variables the first CLTs are due to Markov (for Markov chains) and 
Bernstein [2] for random variables that are allowed to have some short range dependency but have to 
be independent if separated by a suitable time difference (for more than a power of the length n of the 
partial sums 5„). In 1956 Rosenblatt [35j then introduced the notions of uniform mixing and strong 
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mixing (see below) and proved a CLT for the partial sums Sn of random variables that satisfy the strong 
mixing property. In |36| he then proved a more general CLT for random variables on systems that 
satisfy an norm conditiorQ. Around the same time Nagaev |24| proved a convergence theorem for 
the stable law for strongly mixing systems. His result covers the case of the CLT and formed the basis 
for Ibragimov's famous 1962 paper [^Oj in which he proved for finite partitions 'a refinement to SMB's 
theorem' by showing that In{x) = — log ^{An{x)) is in the limit lognormally distributed for systems that 
are strongly mixing and satisfy a regularity condition akin to a Gibbs property. Based on his results and 
methods, Philipp and Stout [32] proved the almost sure invariance principle for the information function 
/„ under similar conditions as Ibragimov used (requiring faster decay rates) . This result in turn was then 
used by Kontoyiannis |,21] to prove the almost sure invariance principle, CLT and the law of the iterated 
logarithm LIL for recurrence and waiting times, thus strengthening the result of Nobel and Wyner [2S] 
who showed that for strongly mixing systems (without regularity condition) the exponential growth rate 
of waiting times equals the metric entropy. 

Various improvements and refinements to the CLT for the information function have been succes- 
sively done mainly for measures that satisfy a genuine Gibbs property. For instance Collet, Calves and 
Schmitt | 12j in order to prove the lognormal distribution of entry times for exponentially ^'-mixing Gibbs 
measureqj needed to know that /„ is in the limit lognormally distributed. A more general result is due to 
Paccaut [28' for maps on the interval where he had to assume some topological covering properties. For 
some non- uniformly hyperbolic maps on the interval similar results were formulated in [141 17] . However all 
those results use explicitly the Gibbs property of the invariant measure fi to approximate the information 
function /„ by an ergodic sum and then to invoke standard results on the CLT for sufficiently regular 
obscrvables (see for instance [TTIIHIIS]). (Of course the variance has to be non-zero because otherwise 
the limiting distribution might not be normal as an example in [12| illustrates.) 

Results that do not require the explicit Gibbs characterisation of the measure like Kontoyiannis' 
paper [21j , all ultimately rely on the original paper of Ibragimov [20j and require apart from the strong 
mixing condition the regularity of the Radon-Nikodym derivative of the measure under the local inverse 
maps. In [18j we went beyond his regularity constraint and proved a CLT with error bounds for the 
lognormal distribution of the information function for (ijj, /)-mixing systems which included traditional 
-f/j-mixing maps and also equilibrium states for rational maps with critical points in the Julia set. 

The present paper is significant in two respects: (i) we allow for the partition to be countably infinite 
instead of finite and (ii) unlike Ibragimov (and all who followed him) we do not require an L^-regularity 
condition for the Radon Nikodym derivative for local inverses of the map. This condition which was 
introduced in [20| is the equivalent of what otherwise would allow a transfer operator approach 
to analyse the invariant measure and imply the Gibbs propert}{f|. We moreover prove that the rate of 
convergence is polynomial (Theorem[2]) and the variance is always positive for genuinely infinite partitions. 

Let us note that convergence rates for the CLT have previously been obtained by A Broise [6 for a 
large class of expanding maps on the interval for which the Perron-Frobenius operator has a 'spectral 
gap'. Similar estimates were obtained by Pene [29] for Gibbs measures for dispersing billiards. 

This paper is structured as follows: In the second section we introduce uniform strong mixing systems 
and in the third section we prove the existence of the variance of strongly mixing probability measures 
(Proposition [Ti]) as well as the growth rate of higher order moments (Proposition [TS]) . This is the main 
part of the proof (note that Ibragimov's regularity condition was previously needed precisely to obtain 
the variance of the measure). In section 4 we then prove the CLT using Stein's method of exchangeable 

^ The map T satisfies an norm condition if sup — decays exponentially fast as n — > oo. This is a somewhat 

f:ti(f)=0 ll/r 

stronger mixing condition than the strong mixing condition 

^ We say an invariant probability measures fi is Gibbs for a potential / with pressure P{f) if there exists a constant 
c > so that 

i < l^{An{x)) ^ ^ 

for every x £ Q and n = 1, 2, . . .. 

^ More precisely, Ibragimov's condition requires that the L^-norms of the differences f — fn decay polynomially, where 
/ = lim„^oo /„ and f„{x) = IogP(2:o|a;_ia;_2 . . . X-n). 
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pairs. In section 5 we prove the Weak Invariance Principle for /„ using the CLT and the convergence 
rate obtained in section 3. 

I would like to thank my colleague Larry Goldstein for many conversations in which he explained 
Stein's method to me. 



2 Main results 

Let T be a map on a space fl and a probability measure on Q. Moreover let ^ be a (possibly infinite) 
measurable partition of fl and denote by = Vj=o T^^A its n-th join which also is a measurable 
partition of 51 for every n > 1. The atoms of A" are called n- cylinders. Let us put A* = U^i-^" ^'^^ 
the collection of all cylinders in fl and put \A\ for the length of a cylinder A G A* , i.e. |A| = n if A £ A". 
We shall assume that A is generating, i.e. that the atoms of A°° are single points in O. 

2.1 Mixing 

Definition 1 We say the invariant probability measure ii is uniformly strong mixing if there exists a 
decreasing function : N R"*" which satisfies '!/'(^) as A oo so that 



J2 {KBr^c)-p.{BUC)) 

(B,c)es 



for every subset S of A^ x T ^ and every n,TO, A > 0. 

Various kinds of mixingfl 

In the following list of different mixing properties U is always in the cr-algebra generated by A^ and V 
lies in the cr-algebra generated by A* (see also jl3]). The limiting behaviour is as the length of the 'gap' 
A ^ 



1. %l> -mixing: sup sup 

n U,V 



2. Left 



-mixing: sup sup 
n u.y 



^l{u)^iiv) 



/i(C/) 



0. 



3. Strong mixing |35[ I20j (also called a-mixing): sup sup /i(C/ H T 

n U,V 



4. Uniform mixing [35l 136]: sup sup 

n U,V 



u,v ■ 

lj2KunT---^v)-f^{UMV) 



as fc ^ oo. 



-l^iU) 



as A ^ oo. Clearly 



One can also have right (/)-mixing when sup„ sup^/ y 
^-mixing implies all the other kinds of mixing. The next strongest mixing property is (/)-mixing, then 
comes strong mixing and uniform mixing is the weakest. The uniform strong mixing property is stronger 
that the strong mixing property but is implies by the dynamical 0-mixing property as we will see in 
Lemma ini In fact if /i is strong mixing then the sets S in Definition [1] have to be of product form. 

For a partition A we have the (n-th) information function In{x) — — log fi(An{x)) , where An{x) denotes 
the unique n-cylinder that contains the point x e fi, whose moments are 



KUA) = J2 /^(^)i iogM(^)r' = E(c), 



AeA 



'Here we adopt probabilistic terminology which differs from the one used in the dynamical systems community. 
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w > not necessarily integer. (For w = 1 one traditionally writes — Ki{A) — — ^(v4) log/x(A).) 

If A is finite then Kiu{A) < oo for all w. For infinite partitions the theorem of Shannon-McMillan-Breiman 
requires that H{A) be finite [8l[T0]. In order to prove that the information function is lognormally dis- 
tributed we will require a larger than fourth moment K,u](A) for some w > 4 (not necessarily integer) be 
finite. 



2.2 Results 

For X E H. we denote An{x) the n-cylinder in A"^ which contains the point x. We are interested in the 
limiting behaviour of the distribution function 



for real valued t and a suitable positive a, where h is the metric entropy of /i. The Central Limit 
Theorem states that this quantity converges to the normal distribution N{t) = 1'^ ds as n 

goes to infinity if there exists a suitable a which is positive. Our main result is the following theorem: 

Theorem 2 Let fi be a uniformly strong mixing probability measure on Q with respect to a countably 
finite, measurable and generating partition A which satisfies Kw{A) < oo for some w > A. Assume that 
tp decays at least polynomially with power > 8 + ^^^i^ . 

Then 
(I) The limit 

G — lim 



n — >oo rt 

exists and defines the variance of fi. Moreover if the partition is infinite then a is strictly positive. 
(II) lfa>0: 

|S„(0-iV(0| <Co^ 

for all t and all 

(i) K < Jq ~ ^ (p+2)(^)-2)+6 '-^ ^ decays polynomially with power p, 

(ii) n < if ip decays faster than any power. 

The variance cr^ is determined in Proposition [TJ] and essentially only requires finiteness of the second 
moment K2{A). In order to obtain the rate of convergence one usually needs a higher than second 
moment of /„. Since we use Stein's method we require the fourth moment be finite (unlike in jl8j where 
for finite partitions and [ip, /)-mixing measures we only needed bounds on the third moment). 

Throughout the paper we shall assume that Ktj,{A) < oo for some finite w > 4. The case in which w 
can be arbitrarily large (e.g. for finite partitions) is done with minor modifications and yields the obvious 
result for the rate of convergence. For simplicity's sake we assume in the proofs that the decay rate of "0 
is polynomial at some finite power p. The case of hyper polynomial decay can be traced out with minor 
modifications and yields the stated result. 

If the partition A is finite then Km(A) < oo for all w and we obtain the following corollary: 



Corollary 3 Let fi be a uniformly strong mixing probability measure on with respect to a finite, mea- 

-4 ■ 



surable and generating partition A and ip decays at least polynomially with power > 8 + ^^^i- . 
Then 

(I) The limit = hm„^oo ^(-^2(-4") — -ff^(^")) exists (variance of fi). 



(II) //. > 0.- E^it) . Nit) + Oin-n for all t and \ ^ <J T ^ ' 

I To V V decays hyper polynomially. 

By a result of Petrov [31] we now obtain the Law of the Iterated Logarithm from Theorem [2] by virtue 
of the error bound (better than (i^gl^-^i-s (some e > 0) which are the ones required in PP). 
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Corollary 4 Under the assumptions of Theorem\Bi 

In{x)~nh 
lim sup — = — 1 

n^oo (TV 2^1 log log n 

almost everywhere. 

A similar statement is true for the liminf where the limit is then equal to —1 almost everywhere. 

Based on the Central Limit Theorem we also get the weak invariance principle WIP (see section 4). 
Recently there has been a great interest in the WIP in relation to mixing properties of dynamical systems. 
For instance it has been obtained for a large class of observables and for a large class of dynamical systems 
by Chernov in [9] . Other recent results are [HI [HI [32] • Those results however are typically for sums of 
sufhciently regular observables. Here we prove the WIP for /„(a;). 

Theorem 5 Under the assumption of Theorem\^the information function /„ satisfies the Weak Invari- 
ance Principle (provided the variance is positive). 

2.3 Examples 

(I) Bernoulli shift: Let S be the full shift space over the infinite alphabet N and let /i be the 
Bernoulli measure generated by the positive weights pi,p2,... {J2jPj = !)• The entropy is then 

^(m) = SjPillogPjl and since K2{A) = J2iPi^^&^P'. = ^J^i.jPiPi (log^ Pi + log^ Pj) we obtain that 
the variance is given by the following expression which is familiar from finite alphabet Bernoulli shifts: 

a' = K2{A) ~ h{y.f = \ Y.P^p, log' ^. 

We have used that the partition A is given by the cylinder sets whose first symbols are fixed. Here we 
naturally assume that '^^Pi log^pi < oo. If moreover '^^Pi \og^ pi < oo then 

with exponent ^ which is a well known result for unbounded iid random variables. With other techniques 
one can however weaken the moment requirement in this case. 

(II) Markov shift: Again let S be the shift space over the infinite alphabet N and /i the Markov 
measure generated by an infinite probability vector p = [pi,p2, . . .) {pj > 0, ^jPj = 1) and an infinite 
stochastic matrix P {pP = p, PI = 1). The partition A is again the partition of single element cylinder 
sets. If X = X1X2 . ■ .Xn is a word of length n (we write x e A"') then the measure of its cylinder 
set is fj,{x) = PxiPxix2Px2X3 ■ ■ ■Pxr.-ix-a- Thc mctrlc entropy is h{^i) = ^ . ^. -p^P.^j logP^ [35j and the 
variance [331 SO] (see also Appendix) is 




(III) Gibbs states: The measure ^ is a Gibbs state for the potential function / if there exists a 
constant c > 1 so that for every point x <E ^ and n one has ^{An{x)) € [■^ic] e^ (^)-"^(/) where P{f ) 
is the pressure of / and /" = f + foT+--- + fo is the nth ergodic sum of /. If / is Holder 

continuous and T is and Axiom A map, then /i is the unique equilibrium state. In this case the CLT has 
been studied a great deal in particular for finite partitions since standard techniques for sums of random 
variables can be applied (see e.g. [6l [ITl [22l [29] ) . Note that (I) and (II) are special cases of Gibbs states. 
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3 Variance and higher moments 
3.1 Some basic properties 

Let us begin by showing that the uniform strong mixing property is impHed by the 0-mixing property. 
Lemma 6 <f>-mixing implies uniformly strong mixing. 

Proof. Let fihe a left 0-mixing probability measure (the right 0-mixing case is done in the same way). 
That means, there exists a decreasing 0(A) — > as A — > oo so that 

MB nc) - ^liBMC))] < <j>iAMB) 

for every C in the cr-algebra generated by C = 2^-"-^_4'" and every cylinder B £ B = for all n and 
A. Let S cBxC and put Sb for the interection of {B} x C with S. Then \{fi{B n Sb) - fiiB) ^i{S b))\ < 
0(A)/x(B) and 



J2 if^{BnC)-f,{B),,iC)) 

(s,c)es 



< J2 MB n Sb) - ^i{BUSB))\ < J2 '/'(^)^(^) < <^(^) 
BeB BeB 



implies that /i is uniformly strong mixing with ^ — (p. I 

The following estimate has previously been shown for ^/j-mixing measures (in which case they are ex- 
ponential) in [16 and for 0-mixing measures in [1]. Denote by An{x) the atom in (n = 1,2,...) 
which contains the point x E il. (Abadi T also showed that in case (II) the decay cannot in general be 
exponential.) 

Lemma 7 Let fj, be strong mixing. Then there exists a constant Ci so that for all A e An, n = 1, 2, . . .; 

(I) /i(A) < Cin~P if ip is polynomially decreasing with exponent p > 0; 

(II) n{A) < CiO^' for some 9 € (0, 1) if ip is exponentially decreasing. 

Proof. Fix TO > 1 so that a — maxAeA"^ m(^) is less than i and let Ai, A2, . . . be integers which will 
be determined below. We put Uj = jm + J^iZi (P^* Aq = 0) and for x E let Bj = Am(r"j-i+'"x) 
and put Cfc = nj=i ^j- Then An^ix) C Ck and 

niCk+i) = A*(Cfc n Bk+i) = /i(Cfe)A*(Sfe+i) + p{Ck,Bk+i) 

where the remainder term p(Cfc,i3fe+i) is by the mixing property in absolute value bounded by V'(Afe). 
Now we choose Aj so that ip{Aj) < a^+^. Then /i(Cfc+i) < ^{Ck)a + a^^^ implies that /i(Cfc) < c^ai 
(as ^/a < i) for some Cq > 0. 

(I) If -0 decays polynomially with power p, i.e. 'tp{t) < cit~^, then the condition ip{Aj) < a^+i is satisfied 



if we put A," 



C2a 2p 



k > 2p^^. Hence 

— ^|loga| 



for a suitable constant C2 > 0. Consequently < 030 (03 > 1) and therefore 



and from this one obtains fi{An{x)) < c^n^^ for all integers n (and some larger constant C5). 

(II) If decays exponentially, i.e. i/'(i) < cgi?* for some d € (0, 1), then we choose A^ — and 

obtain < rak + cyfc'^, which gives us fc > c^^/rvk (cs > 0) and the stretched exponential decay of the 
measure of cylindersets: 

lJi{A^{x)) < cga'='^\ 

Now put 6 = 0"^. I 
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3.2 The information function and mixing properties 

The metric entropy h for the invariant measure fi is h ~ Um„^oo — where ^ is a generating 
partition of n (cf. [13]), provided H{A) < oo. For w > 1 put riJ{t) ^llog'" i (??u,(0) ^ 0). Then 

see see 
for partitions B. Similarly one has the conditional quantity (C is a partition): 

/i(Sn C) 

BeB,cec ^ 



B,C 



KB) 



Lemma 8 ]18^ For any two partitions B.C for which Kw{B), Kyj{C) < oo and fJ,{C) < e ™ V C € C: 

(i) K^{C\B) < K^{C), 

(ii) K^{B V C)!/'^ < K^{C\BY''" + K^{B)^/^, 
(ill) K^iBWCY'"" < K^iCf'^+Ky.iBf/'^. 

Proof, (i) Since riw{t) is convex and increasing on [0, e"'"] and decreasing to zero on (e~'",l] we have 
'^iXiriyj{ai) < rjuji^iXiUi) for weights Xi > Q = 1) ^'^'^ numbers a; G [0,1] which satisfy 

^iXiai < . Hence 

Kuc\B) = E Mi?)^. f^^^4Sr^) ^ E'/- (em^)^^^^! - E^'- (M^^)) = ^uc). 

iseH.cec ^ ^ c \ b / c 

(ii) The second statement follows from Minkowski's inequality on L'^-spaces: 



KUB^C)- 



< 




n{Br]C)\log^l{Bnc){' 



n{B n c) 



log 



^ n{B n c) 



/i(B) 




nc) I log /i(i?)r 



= i^„(C|S)- +if„(S)- 
(iii) This follows from (ii) and (i). 



I 



Corollary 9 Let w > 1 and A so that Kw{A) < oo and niA) < e V A e yl. Then there exists a 
constant C2 ( depending on w) so that for all n 

K^iA"^) < C2n^. 

Proof. We want to use Lemma [SKiii) to show that the sequence a„ = Kw{A^y^^ , n = 1,2,..., is 
subadditive. The hypothesis of Lemma [5] is satisfied since /i(A) < e"™ for all A ^ A. We thus obtain 
K^iA''+"')^ < if^(^")» +X,„(yl")» for aU n, m > 1 and therefore subadditivity of the sequence a„. 
Since by assumption Ku]{A) < 00 we get that the limit lim„^oo ■^Kw{A^y^^ exists, is finite and equals 
the inf (see e.g. [38]). I 

The function /„ has expected value E(/„) = H{A"), for which we also write H„, and variance = 
cr'^iln) = K2{A") — H^- In general, if ;B is a partition then we write a'^{B) = K2{B) — H'^{B) and similarly 
for the conditional variance a^{C\B). Let us define the function Jg by Jb{B) = — log iJ,{B) — H (B) {B e B) 
then <J^{B) = TliBeB 1^{B)Jb{BY a-nd J Jig dfi ~ 0. For two partitions B and C we put 

for {B, C) e B X C. (This means Jc\b — Jbmc — Jb and cf{C\B) = o'( Jc|e)-) 
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Lemma 10 Let B and C be two partitions. Then 

a{ByC) < <T{C\B)+a{B). 
Proof. This follows from Minkowski's inequality 



a{B V C) = iJc\B + Jb) < ^jKJl\B) + = ^{C\B) + a{B). 

I 

As a consequence of Lemma [SJi) one also has Kw{B W C\B) — Kyj{C\B) < K^iC) which in particular 
implies (j{B V C\B) = a{C\B) < ^/K^. As before we put p{B, C) = n{B n C) - n{B)fi{C) (and in the 
following we often write B — A'^ and C T^'^^^'A^ for integers n, A). 

The following technical lemma is central to get the variance of fi and bounds on the higher moments 

of Jn = In — Hn- 

Lemma 11 Let fi be uniformly strong mixing and assume that K^iA) < oo and < e^"^ y A £ A 

for some w > I. Then for every /3 > 1 and a € [0,w) there exists a constant C4 so that 



log 



< 



BeB,cec 

for A < min(n, m) and for all n = 1, 2, . . .. (As before B = A"", C = T~^-™y^".; 
Proof. Let m, n and A be as in the statement and put 

piB,C) 



£e = l^{B,C) eBxC : 2^-^ < 1 



< 2' 



€ Z. Using the strong mixing property we obtain 



BeB.cec 



log 



^ p{B)f,{C)) 



a 00 



E L,{\i\ + o{i)r 



where L, - E(B,c)e£, ^BnC). Since p{B, C) - 0{l){2' -l)ii{B)p(C) we get 0(V;(A)) - E(b,c)g£, p(^' C) 
0(1)(2^ - l)L'^ where L"^ = T.{B,c)eCt l^iB)^^)- Hence for ^ > one obtains = e'(V'(A))2-^ and 
if £ < then L^ = 0{tp{A)). Also note that if £ = then 



P{B,C) 
m(B)m(C) 



= O 



m(S)m(C) 



and 



E M^nc) 

(B,C)e£o 



log 



. , P{B,C) \ 
^ p{BMC)) 



= 0{\) E P(B,C)=0(^(A)). 

(s,c)e£o 



We separately estimate (i) for £ > 1 and (ii) for ^ < — 1: 

(i) Since ii{B n C) = (l + -jfff^) lJi{B)ii{C) we get for I > 1: 

2^-'L^ = E l^iB)KC)2^'' <Li< E l^iB)p(C)2^- = 2%' 



Thus 



{m+iiY 



E ^"i£< E ^"2%><< E r^73YV'(A)<ci^(A)(m + n)(i+'^)^. 
^=1 1=1 l=\ 
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For £ > (m + we use that n{B D C) > 2'^-^ fi{B) fi{C) on which impUes ^{B)fi(C) < 2^-'^ and 
fi{B n C) < min(/x(B), ^(C)) < 2--^. Hence, on d one has | log /Lt(-B n C)\ > {£ - 1) log a/2. Similarly 
to the previous lemma put 

Dh= U (BDC) 

{B,C)eBxC, fc-l<| logAt(-BnC)|<fc 

and use Corollary [9] to get (as > Kyj{B V C)) 

OO OO 

C2iw)in + m + Ay'">K^iBWC)>Y,KDk){k-ir>C2in + mf^'"-^^ ^ /ipfc)^. 

fc=l fc=[(n+m)'^] + l 

We thus obtain (using that A < min(r;-,m)) 

OO -| 

t^Li < — ^ \\og^,{Bncr^i{Bnc) 

e={n+m)f^ (S,C)eBxC, I logAi(SnC)|>(n+m)'3 

OO 

iOg fc=[(„+m)'3] + l 

(n + to)"'' 

< C3- 



'(n + TO)(/5-i)^ 



for some C3 (which depends on w). 

(ii) For negative values of £ we use < 2^Lf < C42^ip{A) which gives 

00 

Y l^ri£<C4 5]^°2-V(A) <C5V^(A). 

£=-00 e=o 

Combining (i) and (ii) yields 
00 

i=-oc 

which concludes the proof. I 
3.3 Entropy 

The main purpose of this section is to obtain rates of convergence for the entropy (Lemma [13]). 

Lemma 12 Under the assumptions of Lemma \ll\ for every /3 > 1 there exists a constant C5 so that for 
all n: 

\H{B V C) - {H{B) + H{C))\ < C5 (V'(A)n"^ + n'^-^'^-i)-^ ^ 
where B ^ A"" , C = T-^-'M". 

Proof. Using the uniform strong mixing property ijl{B n C) = fj,{B)fi{C) + p{B, C) we obtain 



|:,,«nc)(,o.-2_,,„,_i__,„,(,,j«a_)) 



H{B) + H{C) + E, 
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where by Lemma [TT] (with a — I) 

E = - Mi? n C) log (l + ^l. )=0 (^(A)n^^ + n^-iP-^)-) . 

This proves the lemma. I 
Lemma 13 Under the assumptions of Lemma \ll\ there exists a constant Cq so that (Hm — H{A"^)) 



m 



< Ce — 



for all m, where 76 (0, 1 — ^j^^jriy) */ ''P decays polynomially with power p > -^^j^ and 7 G (0, 1) if ip 
decays faster than polynomially. 

Proof. Let m be an integer. Let B = A"""^, C = T- ^A''^'^ and V = T^^y^^A^ ^i-^^^ Lemma [HI 

H2u = 2i/„-A + OiH2A) + O (V'(A)u2/5 + . 

If we choose 5 e (^^1^, 1) and put /3 = then A = 0{u^) implies that V'(A)u2'3 + = 0(1). 

With A = [u^] we thus obtain H2u = 2Hu+0{A) = 2Hu+0{u^) as H2A = 0(A) and i7„_A = Hu+0{A). 
Iterating this estimate yields the following bound along exponential progression: 

i-i 
3=0 

To get bounds for arbitrary (large) integers n we do the following dyadic argument: Let n — kni + r where 
< r < TO and consider the binary expansion of: k = X]i=o^«^*' "^ti^re = 0, 1 {eg ~ 1, £ — [logj fc]). 
We also put kj = X]i=o (^^ ~ Obviously kj = fcj_i + ej2^ < 2-'+^. If ej = 1 then we separate the 
'first' block of length fcj_iTO from the 'second' block of length 2^m by a gap of length 2[(fcj_iTO)''] which 
we cut away in equal parts from the two adjacent blocks). We thus obtain {Hq — 0) 

for j = 0, 1, . . . , ^ — 1. Iterating this formula and summing over j yields 

The contribution made by the remainder of length r is easily bounded by 

\H.n - Hk„,\ < a (^"1^'=™) < cir < cito. 

Consequently 

Hn = kHm + O {m^2^) + 0(to) = fci?,„ + O {m^k) 
as 2^ < fc < 2^+^. Dividing by n and letting n go to infinity (fc 00) yields 

;j^liminf^ = :^ + 0(TO^-i) 



n^oo n TO 



for all TO large enough. 
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3.4 The variance 

In this section we prove part (I) of Theorem [2] and moreover obtain convergence rates which will be 
needed to prove part (II) in section 4. 

Proposition 14 Let fi be uniformly strong mixing and assume that K-u}(A) < oo and < e^™ V A G 

A for some w > 2. Assume that tp is at least polynomially decaying with power p > 6 + ;;^j32 • Then the 
limit 

= lim -a^iA'') 

n^oo n 

exists and is finite. Moreover for every rj < rjo = 2 (,^^2)(p+2^H-8 ^^^'''^ exists a constant C7 so that for all 
n G N; 



a' 



a^iA") 



n 

Moreover, if the partition A is infinite, then a is strictly positive. 

Proof. Withe = A'\C = r-"-A_4n wehaveby Lemma[Il]i/(SVC) = H{B)+H{C)+0 (^'(A)^^/? + nf^-(.0- 
and get for the variance 

a\BWC) = Yl Mi?nC) flog— i— -iJ(BVC)) 
Bei3,cec \ J 

= ^ Mi? n C) [js{B) + Jc{C) + O (^(A)n2^ + nf^-(^-^>) - log + ) ' ■ 

By Minkowski's inequality: 

a{B V C) - ^JE(B,C)\ < ci (i^(A)n2'3 + „/3-(/3-i)-^ + ^F[B,C) 
(ci > 0) where (by Lemma fTD with a = 2) 

F{B,C)^ Y: KB n C) \oe (1 + ' 'g,. ) < c. (v^(A)n^^ + n'P-i^-')-) , 

and 



E(B,C) = KBr\C){JB{B) + Jc{C)f 

= YKBnC){JB{Bf + Jc{Cf)+2G{B,C) 

B,C 

= a^{B) + c7^C) + 2GiB,C). 
Since and Jc have average zero the remainder term 

G{B,C) = Y KBr)C)MB)Jc{C) 

BeB,cec 

= Yiti{BUC)+piB,C))MB)JciC) 



B,C 



= Yp^b^^)W)Jc{C) 



B,C 
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which is estimated using Schwarz' inequahty as foUows 

\G{B,C)\ < ^ \p{B, C)\ ■ \MB)\ ■ \Jc{C)\ < V(A)a(S)a(C). 



Hence 



Next we fill the gap of length A for which we use Lemma [TO] and Corollary [5] 



|(7(yl^"+^) - n{B V C)| < a(r-"yl^|S V C) < ■y/X2(r-M^) = ^K2{A^) < cgA. 

Since by assumption VIA) < ceA"^ for some p > 6 + -^^^ we take can 5 = (p^2){w-2)+s ^^"^ ^ ~ 

(in particular 6 < i). Then, with A = [n^] we get ■il;{A)n^'^ + n2'9-('3-i)^ < A^. Therefore, as a{B) = 

a{C) — an (where ct„ — a{A'^)), one has 



where in the last step we took advantage of the a priori estimates from Corollary [9] (t2(_4"') < K2{A'^) < 
C2n^ and the choice of 5 which implies that ip{A)n'^ = 0(1). Since 25 < 1 one has cr^ < cgfc for all 
k and some constant cg- Given uq let us put recursively n^+i — 2nj + [rij] {j — 0,1,2,...). Then 

2^no < rij < 2^no Jlto + ^"-f^^) where the product is bounded by 



In the same fashion one shows that 



<+i - 2<. I < cynf imphes 



2^o-„o exp < < 2V„^ exp — 



Tin ' " rin 



Hence 



which simplifies to 



2^0-^0 / cio \ f^^^. 2-'ct^^ Cio 



As w > 2 one has tT„o < oo. Taking limsup as j — > oo and uq ~^ oo shows that the limit — lim„ ^ 

< C7n^(2-2(5) j-Qj. some C7. Now we obtain the statement in the 



exists and satisfies moreover 



(p-2)iw-2) 



proposition for aU ry < 2 - 25 = 2 ^p_^^^^^_^^_^g . 

In order to prove the last statement of the proposition let A be an infinite partition. If we choose 
TT-o large enough so that the error term ©(tt-q '■^ ^*-') in equation ([2]) is < i, then cr^^. > ^nja^g for all j. 
Since 

< = J2 M(^)logV(^)" E Ai(^)M(i?)logA^(A)logA.(i3) 

= i E M(^)M(i?)(log' A*(^) + log' KB)) - E t^{A)f,{B) log^i{A) \ogi,{B) 

A,B A,B 
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2 g 

we conclude that a'^^^ > 0. Hence — lim„ ^ is strictly positive. I 

Remarks: (i) It is well known that for finite partitions the measure has variance zero if it is a Gibbs 
state for a potential which is a coboundary. 

(ii) This proposition implies in particular that the limit lim„^oo ;jj^2(-4") exists and is equal to h^. 

(iii) An application of Chebycheff's inequality gives the large deviation type estimate (cr„ — a{Jn)) 



3.5 Higher order moments 

In the proof of Theorem [2] part (II) we will need estimates on the third and fourth moments of J„. We 
first estimate the fourth moment and then use Holder's inequality to bound the third moment. Denote 

by 

M^{B) = KB)\Jb{BT. 

BeB 

the wth (absolute) moment of the function Jg. By Minkowski's inequality 

where Mm{C\B) = '^ZseB cec i^i^ n C)\Jc\b{B n C)|™ are the conditional moments. It follows from 
Corollary [9] that the absolute moments for the joins A"' can roughly be bounded by Myj{A^) < K^{A^'') < 
C2n'^ ■ This estimate however is useless to prove Theorem [2] and the purpose of the next proposition is 
to reduce the exponent w to in the cases w = 3,4. One can of course get these improved estimates 
also for w larger than 4 (as long as Ki^{A) < oo) but we don't need those higher order moments here. 

Proposition 15 Let be uniformly strong mixing and assume that Kw{A) < oo and Pl{A) < e^"" V A G 
A for some w > 4. Also assume that ip decays at least polynomially with power > 8 + Then there 

exists a constant Cg so that for all n 

MiiA'') < Csn^ 

Proof. With S = yt", C = T'-^-M" we get (by LemmalHl) H{B V C) = H{B) + H{C) + 0(?A(A)ri2^' + 
j^i-(/3-i)tu^ and with Minkowsky's inequality (on spaces) 




\B£B,C£C ^ ' 

< £;i(S,C) + (i/'(A)n2^ +n^-('3-i)«') +F/(6,C) 
where by Lemma [11] (with a = 4) 

F.iB.C)^ Y. MBnC)log«(l + ^^).o(*(A) 



BeB,cec 

and 



li{B)y.{C)J 



Ei{B,C) = l^iBnC){JBiB) + JciC)f 

BeB.cec 

= AU{B) + Mi{C) + Y,KBr\ C) (AMBfJciC) + QJeiBfJciCf + 47^(5) Jc(C)= 



B,C 
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We look individually at the terms in the bracket: 



J2 KBnC)JB{BfJciC) 
BeB,cec 



= Y,{KBMC)+p{B,C))JB{BfJc{C) 

B,C 

< Y.\piB,c)\-\jB{B)njcic)\ 

B,C 

because Jg and Jc have zero average Schwarz inequality. In the same way we get 



^ ^i{Bf^c)JB{B)Jc{cy 



BeB,cec 



< il;{A)a{B)AMC). 



Moreover 



^l{BC^C)JB{BfJdCf - Y.^ii(B)ii{C)+p{B,C))JB{BfJdCf 
BeB,cec B,c 

= a^iB)a\C) + GiB,C), 



where 



Thus 



\G{B,C)\ 



J2 p{B,C)JB{BfJc{Cf 



BeB,cec 



< ^{A)a^{B)a\C) 



Ei{B,C) = Ah{B) + AhiC) + (6 + 2jj{A))a^{B)(j\C) + ?A(A) {M3{B)a{C) + a{B)Kh{C)) . 

As <J^{B) — cr^(C) = cr^ < cin (Proposition [M]) and since by assumption ip{A) = 0{A~p) where 
P > 8 + ^1^4 we can choose (3 = 1 + 5 = 1(4 + ^^i^^) and put A [n^]. This implies A < ^/n (as 
(5 < i) and V(A)7i6'3 ^ ^4i3~{i3-i)w ^ 0(^2) ugjjjg ^jjg ^ pj.iQj.i estimates MsiA'') < KdA"") < Can^ we 
obtain in particular that ?A(A) {M3{B)a{C) + a{B)M-i{C)) = 0{n'^) and therefore 

M^(B V C) = ^ Mi{C) + Mi{B) + C2n^ + O (^(A))^^'^ + n'^^C/'-i)-) , 

where the error term on the right hand side is 0{n^^). To fill in the gap of length A we use Lemma [TOl 
and the estimate on K^^ (Corollary [HI): 



MJ (^2«+A^ _ j^^j [ByC) < Ml (^'"+'^ |S V C) < Kj [A'^) < C3A 



2n+A| 



Hence 



Mi {A"') < i/2M4A")+C2n^ + C3A < ^2M4A") + c^n^ 
(as A < V^), and by induction M^iA'') < Csk^ (with Cs > C4/2). 
A Holder estimate lets us now estimate the third absolute moments of J„ as follows. 
Corollary 16 Under the assumptions of Proposition \15\ there exists a constant Cg so that for all n 

M^A'') < Cgni. 
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4 Proof of Theorem [2] (CLT for Shannon-McMillan-Breiman) 



As before N{t) denotes the normal distribution with zero mean and variance one. We will first show the 
following result (in which nh has been replaced by iJ„ and a^/n by ct„). 

Theorem 17 Under the assumptions of Theorem\^ one has: 

(I) The limit — lim„^oo ^(-^2(-4") — i?^) exists (and is positive if \A\ — oo). 

(II) // cr > then 



< t 



N{t) + — 



for all t and all 

(i) K < jq — ^ (p+2)(«!-2)+6 ^ decays polynomially with power p, 
(a) K < if ip decays hyper polynomially. 

Proof of Theorem 1171 It is enough to prove the theorem with the partition A replaced by one of its 
joins A'' for some k. Since by Lemma [7] /it (A) < e~'" V A G A'^ for some fc > 1 we therefore replace the 
original partition by A'' and will henceforth assume that /i(A) < e^"' for all A £ A. 

Theorem [17] part (I) follows from Proposition [TH For the proof of part (II) let us assume that a is 
positive. We will use Stein's method to prove the CLT in the form of the following proposition which is 
modelled after [57] : 



Proposition 18 fSSf Let (W, W) be an exchangeable pair so that E{W) — and va,i{W) = 1 and assume 

E(W'\W) = {1-X)W 

for some A G (0, 1). Then for all real t: 
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\W < t)-N{t)\ < -^/var {E{{W' - W)^\W)) 
A 



6 



\e{\W'-W\^). 
A 



We proceed in five steps: (A) We begin with a classical 'big block-small block' argument and approximate 
Wn = ^ by a sum of random variables which are separated by gaps. In (B) we then replace those random 
variables by independent random variables. In (C) we define the interchangeable pair in the usual way 
and estimate the terms on the right hand side of Proposition [18] In (D) and (E) we estimate the effects 
the steps (A) and (B) have on the distributions. 

We approximate Wn = ^ (clearly E(T4^„) = 0, (j{Wn) = 1) by the random variable Wn — X)^=o 
T™'J (that is Wn 



Sj=o ° "') 'where in! = m + A and n = rm + (r — 1) A. (For other values 



of n not of this form we get an additional error term of the order m'.) 



(A) If we put A"" = Vj=d T-™'Jy^™ then 

\\Wn-Wnh 



< 



— IIJ^" 
Cn 



r-1 

E 



Un O J 



m ] 



— \H{A^)-rHn 

On 



IrUn 



r-1 

E 



We individually estimate the four terms on the right hand side as follows: 
(i) By Lemma [To] and Proposition [T4l 
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(ii) If Vk = VLd r-™'^^™ then Vk+i = X>fc V T-'"''=y^'", fc = 1, 2, . . . , r, and by Lemma[n] (a = 2) 



Dev^,AeT-"^'''A" 



^fi{Dr\A) ^i{D) fi{A) 
< C2(V'(A)n3'' + n2'5-('3-i)-) 



for fc = 1, 2, . . . , r. Hence (as X>i = yt™) 

r — 1 r 



J=0 



/c=l 



(iii) - rHm\ < c^r [i){l\)r?^ + n^-(/3-i)"') by LemmalU 

(iv) Since by Proposition [14] 



1 1 



Lemma [TU] and again Proposition [HI 



k„ - ^/f(T„,,| ^ m 



Vrancr„ 



r-l 


( r-l 


Jrn O T"'J" 




j=o 


2 V^=0 



we obtain that the fourth term is 0{yfrm, ''), for any r\ < rjQ. 
Therefore, if n is large enough, 

WWn-Wnh < C6 ( ^ + ^ M{A)n^0 + n2/3-(/5-i)- + ^ 



to'' 



TO** 



< C7 

as (J„ ~ and /3 — 1 > 0. 

(B) Now let for j = 0,1, . . . ,r — 1 he independent random variables that have the same distributions 
as Wm o T™-', j — 0, 1, ... ,r — 1. Put Dy^it) for the distribution function of the random variable 
~ 51) j=o -^j ^"^"i ^Wr, (^) distribution function of Wn ■ Since Vn and Wn assume the same 

values, the difference between the distributions is given by (with Vk = VLo ^-4™ as above): 



sup 
t 



D^(t)-DvAt) 



< E ••• E 

r-l 

^ E E E \KDr^A)-^^{DMA)\ 

fc=0 DeV^ AeT-^^'i'A"' 

= E E E \p(D,A)\ 

k=0 DeVk AeT-'^'i'A"^ 
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by the mixing property if we assume n is large enough. 

(C) In order to apply Proposition [TS] let us now define an interchangeable pair in the usual way by 
setting = Vn — '^^y + "^^* where Y € {0, 1, . . . , r — 1} is a randomly chosen index and X* is a 
random variable which is independent of all other random variables and has the same distribution as the 
Xj. Since the random variables Xj for j = 0, 1, 1, are i.i.d., the pair {V^, Vn) is exchangeable. 

Moreover 

(i.e. A=i). 

We now estimate the two terms on the rights hand side of Proposition [TSl separately: 
(i) The third moment term of Proposition [TSl is estimated using Corollary [TBI 

Hence 



iEi\V;,-V^\^) = J%^0(r-i 

A y f 2 \ 

(ii) To estimate the variance term we follow Stein [37^ and obtain 

yar (E ( (F^ - K)'| < -^var {{Xy ~ X*)^\Xo, Xi, . . . , Xr-i) 

Since 



we get 



var(E(X|.|14)) =yar ( -E^H = Il^^r ( E ^'1 = -'^var(X2) = ^varl^o^). 



r 

Since Xq has the same distribution as -^Jm we have E(Xo) = and by Propositions fT4l and [T5l 
yar(X2) = var f \ ^ ^^2(j2j < ^M4(^™) < Ciq. 



Hence 



A V r-^ .^r 

Combining the estimates (i) and (ii) yields by Proposition [TSl 

|P(K <t)~ N{t)\ < cn^ + ^ < 



(D) Part (B) and (C) combined yield 

P(Vi^« <t)- N{t) < \F{Vn <t)- N{t)\ + 



D 



Di 



Let us put e = \\Wn — W^ri||2 and e' — supj P(W^„ < ^) ~ ^(i) . Then (-Div™ is the distribution function of 

Wn) N{t) < e' for t < -| loge'| and therefore D^^{t) < 2e' for t < -| loge'| and similarly |1 - N(t)\ < e' 
and consequently |1 — (i)| < 2e' for alH > | loge'| we get 



logc'l 



<2|loge'| 



2|loge'|e 
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and (since distribution functions arc increasing) 



Dw^ - D,i 



w„ 



< 2|loge'|e + 2e' 



(E) To optimise the bound 

|P(W„ <t)~ N{t)\ < \nWn < t) - Nit)\ + \\Dw,, - D^^ lU < 2| loge'le + 3e' 

we distinguish between the case when ^ decays (i) polynomiaUy and (ii) hyper polynomially. 

(i) Assume that ip decays polynomially with power p > 12. Let S,a £ (0, 1) and put m — [n"], A = [m^] 
(i.e. A ~ n"^ , V'(^) = 0{n~"'^P). Then (assuming n^i^ yZ-ipl^A) — 0(1) which will be satisfied once we 
choose (3 and S) 

\\Wn -Wn\\2< Cl3 (71^-"+"'' + „^-"+i/3-^"-5p + „i+/3-a- i (/3- _^ ^i-f-a„^ 

The first three terms on the right hand side are optimised hy (3 = (p^^)(^^2)+6 '^^ ^ Then 
\\Wn — Wn\\2 < e, e = 0{n^), where x = max (^^ — a + (^p^2){w-2)+6 ^ I ~ f ~ ■ "^^^ fourth term 
is smaller than the first three since we can assume that 77 > ^ as w > 4. The value of a is found by 
minimising the error term 2e| log e'| +3e'. Ignoring the logarithmic term we obtain a = | + (p-|,2)(tu-2)+6 
which implies 

|P(W„<i)-iV(t)| <C14^, 

for any k < — f (p_|_2)(^-2)+6 • ^o^e that arj > k for all (possible) values of p and w. 

(ii) If decays faster than any power then we can choose 6 > arbitrarily close to zero and obtain a < | 
which yields the estimate |P(W„ < t) — N{t)\ < ci^:^, for any k < j^. 

This concludes the proof since Wn — . I 

Proof of Theorem [2I We use Theorem [T7l and have to make the following adjustments: 

(i) To adjust for the difference between Hn and nh we use Lemma fT3l 

P (^ ^"(^)^"^ <t^=P <t + (J--) ) = Nit) + O {n-^) + O (n^ 

Since p is big enough 7 can be chosen so that 7 — i > k. 

(ii) By Proposition [13] — a + O (n~^) which yields 

p (^.^Mzjh < <^ = p ( ^n{x)-H,, ^ ^ ^^^^^^ ^ ^ ^ ^^^^ ^ ^ (^^-nu„(,,«) 

where t„ ~ ^£j/^ ^ t{l + O (n^'')). This concludes the proof since rj can be taken to be > k. 

5 Proof of Theorem [5] (Weak Invariance Principle) 

In order to prove the WIP for /„(x) — — log/i(A„(a;)) denote by Wn,x{t), t £ [0, 1], its interpolation 

Ik{x) — kh 



X £ ft and linearly interpolated on each of the subintervals . In particular Wn.x £ C'oo([0, 1]) 

(with supremum norm). Denote by D„ the distribution of Wn.x on Cco([0, 1]), namely 

Dn{H)^ ^ii{x£n : Wn,x £ H}) 
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where iJ is a Borel subset of Coo([0, 1]). The WIP then asserts that the distribution _D„ converges weakly 
to the Wiener measure, which niCcLiis thcit Sii - — I'll — nh is for large n, and after a suitable normalization 
distributed approximately as the position at time i = 1 of a particle in Brownian motion [3] . 

If we put Si = — log ^{Ai{x)) — ih{fi) then two conditions have to be verified (|3j Theorem 8.1), namely 

(A) The tightness condition: There exists a A > so that for every e > there exists an iVo so that 

max IS-il > 2AV^ ) < ^ (3) 
for all n > Nq. 

(B) The finite-dimensional distributions of Si converge to those of the Wiener measure. 

(A) Proof of tightness: As before let Ji — li — Hi and note that ih — Hi = 0{i^~^), 1 — 76 i p(^^i) , 1), 
(Lemma [T3| is easily absorbed by the term Xy/n as 1 — 7 < i. In the usual way (cf. e.g. [3]) we get 



n-l 

^rnax | J,;| > 2AxAI ) < P (|./„| > AV^) + ^ ^ (-^i n {\Ji - J„| > XVn}) 



i=0 



where Ei is the set of points x so that | Ji(x)| > 2X^fn and | Jfc(a;)| < 2X\fn for fc = 0, . . . , i — 1. Note 
that Ei lies in the cr-algebra generated by . Clearly the sets Ei are pairwise disjoint. To estimate 
ii{Ei^{\J^- J.n\> AV^}) let us first 'open a gap' of length A < §. Let J[" = V y-'-'^y^"-'-'^ (if 
i < 2. and — A^^^ V T^M"^'^ if i > f ), denote by /„ its information function and by Hn ~ K^n) 
its entropy. Obviously Hn > Hn and moreover /i(/„ — In) — Hn — Hn < < ciA. Since by Lemma ITOl 
and Corollary [5] (as A" refines yl") 



<j{In - In) = <J{A^\A') < \/K2{A^) < C2A 

we obtain by Chebycheff 's inequality ( J„ = /„ — iJ„) 



\Jn~Jn\>i)<^^\^<C,^. (4) 



By the uniform strong mixing property 



/„(5)=/.(i.)+/,_A(C)-l0g(l + ^^^) 



values are Y{B,C) — — log ( 1 + ) then by Lemma [TT] (a = 2 



for all {B,C) e A' x T~^-^j[n-t~A _ y denotes the random variable on A' x t-'-^A"-'"^ whose 

P{B,C) 

(Y) < \\Y\\l2 < Ci (^ip{A){n - A)3'9 + (n - A)^'^"'^^'^^^^ 
for P > 1 arbitrary. By Chebycheff's inequality this implies 



a'' 



(I J. - J. - J.-.-, o T--| > ^) < ^ < ^ ' ' . (5) 



Then 



{E, n {\Jn - Jzl > A^^}) < (s« n {1 J„ - J„| > £}) + (s,: n {1 J„ - J, - Jn-^-A o r+^l > £}) 

(^;^ n {I J„-,-A o r+^l > AV^- 2£}) . 
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The last term on the right hand side can be estimated using the mixing property (note that Ei is in the 
(T-algebra generated by A^, and {| J„-i-A| > A^/n — 2£} is in the cr-algebra generated by 7"-»-'^_4"-»-'^) 



< fx{Ei) [ 2N 

using Theorem [T71 in the last step. 

We finally obtain (as P(| J„| > A^^) < 2iV(A) + CiU-") 



BCE, CCT-»-^{| J„_,_a|>AVH-2£} 



Coin 



A)-M +^(A) 



max|J,|>2AV^ < 2iV(A) + C4n"'' + V ^ ( n {| J„ - J„| > £} 

0<i<n / ^ — ^ \ 

i 

V'(A)7i3^' + n2'3-(/3-i)- 



+nC4- 



Co(n- 



< 2iV(A) + cgn-'' + C6 



A2 + V(A)n3'3+n2^-('3-i)- 



£2 



2A^ 



nV'(A) 
/A^/n- 2£ 



(if A < ^ is small enough) . If ip decays at least polynomially with a power larger than 8 + then we 

can put £ ~ n", A n" and choose a' < a < ^ and > 1 (e.g. /9 = ;;2;32 , a' < ^) so that the terms on 
the right hand side which don't involve the normal probability N decay polynomially in n. This proves 
the tightness condition ([3]), since for every e > one can find a A > 1 so that the quadratic estimate 
holds for all n large enough. 

(B) Proof of the finite- dimensional distribution convergence: For t G [0, 1] define the random variable 

Xn{t,x) = -^-j= (5'[„t](a:) + {nt~ \nt\) (S'[„t]+i(a;) - S'[„f](a:))) 

which interpolates S\nt\- It is defined on and has values in Coo([0, 1]). 

We must show that the distribution of (X„(t, x), Xn{t, x) — Xn{s, xj) converges to (A/'(0, <), A/'(0, t — s)) 
(0 < s < t) SuS n oo, where A/"(0,t) is the normal distribution with zero mean and variance t^. To 
prove this as well as the convergence of higher finite dimensional distributions it sufhces to show that 
Xn{t,x) — Xn{s,x) converges to J\f{0,t— s) ( 3 Theorem 3.2). We obtain by Lemma [Ol 



and by (gl), (O and Theorem [2] 



Su 



= J\, 



Jr, 



0{{nt 



J[nt] — J[ns] — J[nt]-[ns]-A ° 



>A^ < P{\j[nt]-J[nt]\>e)+P{ 

+F(|Jm-M-a| > AaV^-2£) +o((ni)^-'') 



ns]+A 



> 



< 



N 



Acr^n - 2£ 



Co 



A2 



^y[nt] - [ns] - A J 

V'(A)(ni)3/3 („^)2/3-^(;3-l)«, 



£2 



(n(t- s))'= 



0(1) 

{[nt] - [ns] - A)'^ ^ (nt)i-T 



Vt^J ' 
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assuming ^ — 7 > k and n{t — s) » A. Similarly to above we used a random variable Y on x 
y-M-A_4[«t]-[«s]-A giygj^ Y{B, C) = - log (1 + f,fB)f(c) ) ■ Now let ^ - n", A - n"' and a' < a < i 
and /3 > 1 so that the terms on the right hand side other than N (A/a/F^~s) decay polynomially in n. 
Hence Si^^t] ~S[ns] ^^id therefore X„(t, x) — Xn{s, x) converges in distribution to A/'(0, \/t — s) as n — > 00. 1 

6 Appendix (Markov chains) 

Here we compute the variance for the Markov measure on an infinite alphabet. As in section [2.31 let T, 
be the shiftspace over the alphabet N and the Markov measure generated by the probability vector p 
and stochastic matrix P. Then 

'^l-l E M^Xy)flog^ + El°g§^l ^A + B + C + D, 

where 

Y: f.{xMy)log'P^^lYp,p,\og'^=0{l) 

x,yeA" ij P^ 

and 

p 

^ = E E M^H27)iog^iog9^ 

3=1 s,y&A" ^y.yj+i 

n-l 

= E E {'^OgP^, 10gP:,^.x, + l + ^OgPy, lOgPy^y^^, ~ ^OgP^, \OgPy^y^_^, ~ lOg^j,, lOgP^.X. + l] 

j=l x,yeAi + '^ 
n-l 

= 2E E M(^) log log Pa;^,;^- + i +2(n- l)/l Epilog 

i=l xGA^+'^ i 

Since Markov chains are exponentially mixing [3] we get for some d G (0, 1) that 

E m(^) logPxi iogF^^:,^._,, = E?'' log?''' E^^'-^'j i°g-^'j + ^('^^) = -'*E^' i°gp» + 

and therefore 

P = 2EC('?^) = 



The principal term is 



n-l 



^-JE E K^umog'^^-'^Y.p^p^^pMog'^. 

^j=is,y€A" ^y^y^+^ ^ yfc£ 

Lastly we get the correction term 

c = E E M^Hy)iog^p^iog:p^ 

= 2E(--fc) E M^Hy)iog^iog^^^ 
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n-1 

= 2^(n-fc) ^l{x)^^{y){\OgPa:^^JOgP^^:,^^,+lOgPy^y^lOgPy^y^^, -log P:,^:, Jog Py^y^^,- log Py^y^lOg 

/ \ 

fe=i \S£^fc+i / 

2 

Since cr^ = lini„^oo if we finally obtain 

2 ^ . °° ^ ^ 

where the infinite sum converges because the terms (correlations) decay exponentially fast. 
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